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This research is based on a significant problem in credit risk analysis in the 
banking sector caused by class imbalance. We face the problem of the 
model’s inability to accurately identify risks in the ‘‘Charged Off” class. As 
a solution, we propose a stacked ensemble approach that utilizes synthetic 
minority over-sampling technique (SMOTE) to balance the class 
distribution. Experiments were conducted by applying SMOTE to the 
training data before training the credit model using gradient boosting 
(XGBoost) and random forest (RF) algorithms in a single ensemble. The 
results show significant improvements in precision, recall, and Fl-score after 
applying SMOTE on the unbalanced classes. The updated model achieved a 
striking accuracy rate of 0,97 on resampled training data. This re-search 
clearly identifies the problem of class imbalance as a major challenge in 
credit risk analysis. The application of SMOTE in a stacked ensemble was 
found to be effective in improving model performance, making a valuable 
contribution to the development of more reliable credit models for better risk 


management and revenue generation in financial institutions. 
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1. INTRODUCTION 

In the rapidly evolving landscape of credit risk management within the banking industry, there exists 
a persistent challenge-class imbalance in loan datasets [1]. The sheer volume of fully paid loans far surpasses 
that of charged off loans, impacting the efficacy of models in identifying ongoing credit risk [2]. This 
persistent issue necessitates innovative solutions to strike a balance between safety and profitability [3]. 
Against this background, credit risk management has become a critical aspect for banking institutions, 
demanding constant innovation to effectively handle the challenges posed by unbalanced data sets [4]. The 
increasing need for financial institutions to minimise credit risk while increasing revenue increases the 
urgency to explore new methodologies [5]. 

In the realm of credit analysis, the issue of class imbalance serves as a focal point for this research 
[6]. The skewed distribution poses a challenge, as inaccurate credit evaluations can have profound financial 
implications and erode customer confidence [7]. The substantial impact of bad debts on financial stability 
underscores the criticality of making accurate decisions to mitigate risks [8]. This is the crux of the problem 
that motivates our study. In addressing class imbalance in datasets, traditional machine learning models often 
struggle to capture the nuances of the minority class, leading to suboptimal performance, especially in the 
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identification of high-risk loans [9]. The need for a robust solution becomes evident, prompting the 
exploration of innovative techniques that go beyond conventional methodologies. 

In the contemporary landscape of credit risk management, addressing class imbalance in loan datasets 
has emerged as a crucial focal point [10]. As indicated by Ziemba et al. [11] the prevalence of fully paid loans 
significantly outweighs charged off loans, presenting a challenge in accurately identifying ongoing credit risks. 
Traditional machine learning models face limitations in capturing the nuances of minority classes, leading to 
suboptimal performance in high-risk loan identification [12]. This issue has prompted researchers to explore 
innovative techniques, such as the synthetic minority over-sampling technique (SMOTE), to rebal- ance 
datasets [13]. To address the prevalent issue of class imbalance, particularly in the banking industry, the study 
proposes a novel approach—SMOTE. Notably, the application of SMOTE is not confined to individual 
models but extends to the ensemble level, particularly within the innovative framework of stacking ensemble 
models [14]. This research seeks to systematically explore the integration of SMOTE with stacking ensemble 
models, shedding light on the impact of this combination on decision-making within the ensemble. By 
enriching the dataset through SMOTE, we aim to create a more balanced representation, enabling the 
underlying base models—gradient boosting (XGBoost) and random forest (RF)—to learn from a more diverse 
and evenly distributed information pool. This becomes a critical step in ensemble model development, given 
that balanced uniformity and diversity in the dataset greatly affect ensemble performance [15]. 

The primary contribution of this research lies in leveraging SMOTE within the context of ensemble 
stacking, aiming to substantially improve credit risk management practices. Our approach focuses on 
achieving a balance between precision and recall, thereby providing a more comprehensive and effective 
solution to credit risk assessment. Through this innovative contribution, we not only aspire to enhance model 
performance metrics but also deepen the understanding of how SMOTE, when applied at the ensemble level, 
can provide significant benefits. 


2. METHOD 

In this study, we propose the development of a stacking ensemble model using SMOTE to improve 
risk management and increase revenue on bank credit data is in Figure 1. A major challenge faced in credit 
analysis is class imbalance, where the number of fully paid loans far outnumber the charged off loans. 
Therefore, we apply SMOTE to the dataset to even out the class distribution, improving the model’s ability to 
identify the true risk. Our method involves using base models, such as RF and XGBoost, which are then 
combined in a main model using the voting classifier (VC) technique. The SMOTE process is performed on 
the training data to ensure the main model gets balanced information from each class. The following diagram 
visualizes the steps of ensemble model development and SMOTE application on the credit dataset, which is 
expected to provide a more reliable and accurate solution in credit risk management. 
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Figure 1. Stacking ensemble SMOTE of proposed method 


2.1. Dataset 

The dataset used in this research is a banking credit dataset obtained from Kaggle. This dataset 
consists of 19 features, namely loan ID, customer ID, loan status, current loan amount, term, credit score, 
annual income, years in current job, home ownership, purpose, monthly debt, years of credit history, months 
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since last delinquent, number of open accounts, number of credit problems, current credit balance, maximum 
open credit, bankruptcies, and tax liens. This dataset provides an overview of borrower profiles and credit- 
related factors that can be used in risk analysis. A description of each feature in this dataset is provided in 
Table 1. 


Table 1. Dataset features 


Features Description 
Loan ID Unique identifier for each loan record. 
Customer ID Unique identifier for each customer. 
Loan status Binary classification of the loan status (fully paid or charged off). 
Current loan amount The current approved loan amount. 
Term Duration of the loan (short term or long term). 
Credit score The credit score of the loan applicant. 
Annual income The annual income of the loan applicant. 
Years in current job The number of years the applicant has been in their current job. 
Home ownership The status of home ownership (own home, mortgage, or rent). 
Purpose The purpose of the loan. 
Monthly debt The total monthly debt payments. 
Years of credit history The number of years of credit history. 
Months since last delinquent The number of months since the last delinquent payment. 
Number of open accounts The number of open credit accounts. 
Number of credit problems The number of credit problems. 
Current credit balance The current outstanding credit balance. 
Maximum open credit The maximum open credit amount. 
Bankruptcies The number of bankruptcies. 
Tax liens The number of tax liens. 


2.2. Preprocessing 

Before starting the model building process, the first step is to preprocess the dataset. This process 
includes checking for data duplication and handling missing values [16]. From the results of the check, no 
data duplication was found in each dataset entry, indicating the cleanliness of the data used. However, a 
number of features were found to have missing values as shown in Table 2. To address these data gaps, the 
simple imputer method was used with a strategy of filling in the values using the mean [17]. This process 
provides a consistent solution and maintains the integrity of the dataset, allowing us to proceed to the next 
stage of ensemble model development. Subsequently, the missing values within each feature are replaced 
with their respective means [18]. This process ensures that the dataset becomes more complete, mitigating 
potential biases introduced by missing data during subsequent analysis. 


¥ (available values) 


[> Seas eR (1) 


number of non-empty entries 


Table 2. Summary of missing values before imputation 


Feature Missing values 
Credit score 1947 

Annual income 1947 

Months since last delinquent 5331 
Bankruptcies 17 

Tax liens 1 


2.3. Exploratory data analysis 

The initial analysis of this dataset focuses on the classification distribution of ’Loan Status’ which is 
the target variable in this study. From the total data of 10,058, there are two main classifications, namely 
Fully Paid’ and ’Charged Off’. The analysis shows that 7,744 entries indicate loans that have been fully 
repaid (’Fully Paid’), while 2,314 entries indicate loans that cannot be honored (Charged Off). The 
visualization shown in Figure 2 depicts the comparison between the ’Fully Paid’ and ’Charged Off 
classifications in the form of a bar chart. The graph provides a clear picture of how balanced or unbalanced 
the distribution is between loans that have been fully repaid and those that cannot be honored. This 
information is an important cornerstone in the initial understanding of the data characteristics, allowing 
researchers to identify class imbalances that might affect the performance of the classification model. 

Next, we analyzed the distribution of risk categories based on credit scores. In this categorization, 
we divided the data into three categories: low risk, medium risk, and high risk. If the credit score is greater 
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than or equal to 750, then the borrower is considered to have low risk. For credit scores in the range of 700 to 
749, we categorize as medium risk. Meanwhile, credit scores below 700 are considered high risk. The results 
of this analysis show that there are 667 borrowers with low risk, 3591 borrowers with medium risk, and the 
remaining 5800 borrowers are categorized as high risk. A visualization of this analysis can be seen in 
Figure 3. The visualization shown in Figure 2 depicts the comparison between the ’Fully Paid’ and ’Charged 
Off classifications in the form of a bar chart. The graph provides a clear picture of how balanced or 
unbalanced the distribution is between fully paid and non-fulfilled loans. This information is an important 
cornerstone in the initial understanding of the data characteristics, allowing researchers to identify class 
imbalances that might affect the performance of the classification model. 
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Figure 2. Loan status distribution (“‘Fully Paid’’ vs. ‘‘Charged Off’) 
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Figure 3. ROC curve 


2.4. SMOTE 

SMOTE is a valuable technique used to address class imbalance in machine learning datasets, a com- 
mon issue where one class is significantly underrepresented. In our loan status dataset, SMOTE is applied to 
balance instances between ”Fully Paid” and ’Charged Off” classes. The fundamental concept of SMOTE in- 
volves generating synthetic samples for the minority class by interpolating between existing instances [19]. For 
a given minority class sample X;, SMOTE selects k nearest neighbors (typically k=5) and creates synthetic 
samples using the formula: 


Xnew = Xi + 6 x (X; — Xi) (2) 


Here, Xi is the original minority class sample, Xj is a randomly chosen neighbor, and 6 is a random value 
between 0 and 1, determining the interpolation extent [20]. This process is repeated to achieve the desired 
class balance. SMOTE enhances the model’s generalization and prediction accuracy on the minority class by 
introducing synthetic samples [21]. This approach is particularly beneficial in scenarios like credit risk 
analysis, where accurate identification of default cases is crucial for effective risk management. 


2.5. Stacking ensemble model 


The method applied in this research carries the concept of stacking ensemble, a technique that com- 
bines predictions from several models [22]. The two basic models used are XGBoost classifier (GBC) and 
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random forest classifier (RFC). The use of this combination was chosen with the consideration that both have 
their own advantages and disadvantages. According to Putrada et al. [23] the GBC tends to be accurate and 
can handle the complexity of non-linear relationships, while the RFC has the advantage of handling 
overfitting and reliability against unbalanced data [24]. The combination of XGBoost and RF in the base 
model is expected to provide better results as they complement each other’s weaknesses. XGBoost can 
”learn” from previous model errors, while RF can help reduce variance and improve generalization. In 
addition, the base model is combined in the VC as the main model. In this concept, the final decision is taken 
based on the majority of votes from the base models. 

RF works by creating a number of decision trees on the training data and combining the prediction 
results from each tree to give the final result [25]. XGBoost works by sequentially combining a number of 
weak models, in this case, decision trees. The resulting model places more emphasis on data that was deemed 
difficult to predict by the previous model. The general formula for XGBoost can be represented as (3): 


Fin (x) = Fm-1(x%) +n: hm(x) (3) 


Here, Fm(x) is the model prediction at the m-th iteration, Fm—1(x) is the model prediction from the previous 
iteration, n is the learning rate, and hm(x) is the weak model at the m-th iteration. VC combines predictions 
from multiple models by assigning weights to each model. One type of VC is hard voting, which selects the 
class based on the majority vote. The formula can be represented as (4): 


VC(x) = argmax, (Zi wi 100 = c)) (4) 


Here, V C(x) is the prediction of the VC for sample x, N is the number of models, wi is the weight for the i-th 
model, and I(yi(x)=c) is the indicator function that is 1 if the i-th model predicts class c for sample x and 0 
otherwise. 

The formula for the stacking ensemble combination is: 


Ensemble Prediction = VC (GBC(x), RFC (x)) (5) 


In other words, the final result of the stacking ensemble is the prediction of the VC using the predictions from 
the GBC and RFC as inputs. 


3. RESULTS AND DISCUSSION 

By applying the SMOTE to address class imbalance, the distribution of loan status is successfully 
balanced. Before the application of SMOTE, the dataset had 7744 instances of ’Fully Paid’ and 2314 
instances of Charged Off.’ However, after the application of SMOTE, the two classes were successfully 
balanced, resulting in 6185 instances for "Fully Paid’ and 6185 instances for ’Charged Off,’ creating a more 
balanced and robust training set for the classification model. Prior to the SMOTE process, the classification 
report shows results indicating good model performance on the class ’Fully Paid,’ with an Fl-score of 0.90, 
indicating high precision and recall. However, in the class Charged Off,’ the model’s performance dropped 
significantly with an Fl-score of 0.41. This shows that the model has difficulty in identifying and predicting 
the ’Charged Off? class, indicating class imbalance and requires special handling such as the application of 
SMOTE to improve performance on minority classes. The comparison data of precision, recall and f1 score 
metrics before and after the SMOTE process is presented in Table 3. 


Table 3. Comparison of results before and after SMOTE 


Class Precision before SMOTE Recall before SMOTE F1-Score before SMOTE Support before SMOTE 
Charged off 0.84 0.27 0.41 453 

Fully paid 0.82 0.99 0.90 1559 

Class Precision after SMOTE Recall after SMOTE F1-Score after SMOTE Support after SMOTE 
Charged off 1.00 0.94 0.97 6185 

Fully paid 0.94 1.00 0.97 6185 


After applying the SMOTE technique to the dataset, the prediction using the main model showed very 
satisfactory results. The classification report on the resampled training data shows an accuracy rate of 0,97, 
with high precision, recall, and fl-score values in both the ‘‘Charged Off’ and ‘‘Fully Paid” classes. In 
particular, the ’Charged Off’ class achieved a recall rate of 0,94, indicating the model’s ability to correctly 
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recognize high-risk loan cases. These results illustrate that the use of SMOTE successfully addresses class 
imbalance and improves model performance, especially in identifying potentially problematic loans 
(‘Charged Off’). This proves that addressing class imbalance with SMOTE contributes significantly to 
improving the accuracy and predictive ability of the model. 

Before the SMOTE process, the model had an ROC AUC of 0.76 and a G-mean of 0.52. However, 
after implementing SMOTE, there was a significant improvement in model performance with an ROC AUC 
of 1.00 and a G-mean of 0.97. The ROC AUC and G-mean evaluation results of the classification model 
before and after the application of SMOTE show significant differences. Before the SMOTE process, the 
model showed an ROC AUC level of 0.76 and a G-mean of 0.52. In this context, a ROC AUC level that is 
less than 1.00 may indicate that the model performance can still be improved, and a low G-mean reflects an 
imbalance between sensitivity and specificity. 

However, after the application of SMOTE, there was a marked improvement in the model 
evaluation. The ROC AUC level reached a value of 1.00, indicating that the model was able to perfectly 
distinguish between positive and negative classes. Meanwhile, the G-mean which increased to 0.97 illustrates 
the excellent balance between sensitivity and specificity. This improvement can be explained by the fact that 
SMOTE successfully addressed the class imbalance in the dataset, particularly in improving the 
representation of the minority class *Charged Off’. By increasing the number of samples in the minority 
class, the model can learn better and produce more optimized results. In Figure 4, the G-mean curve results 
that have a value of ’best=1.00” indicate that the model has an excellent balance between true positive rate 
(recall) and true negative rate (specificity). In this context, a G-mean value of 1.00 indicates that the model 
can optimally classify both classes (positive and negative) without sacrificing performance in either class. 
Meanwhile, ”best threshold (G-Mean)=0.59” is the threshold value that gives the best G-Mean performance. 
This threshold value can be used as a decision boundary where the model classifies an instance as a positive 
or negative class. So, by using this threshold value, the model can achieve the optimal performance indicated 
by a G-mean of 1.00. 


—— G-Mean curve (best = 1,00) 
0.75 Best Threshold (G-Mean) = 0.59 


0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 
Threshold 


Figure 4. ROC curve 


Research conducted by Al-Islam et al. [26] using SMOTE oversampling to handle imbalanced data 
and using ensemble stacking techniques with linear regression (LR), support vector machine (SVM), 
K-nearest neighbor (KNN), RF, and XGBoost classification models. The ensemble stacking models we used 
were GBC, RFC, and VC. We present the results of this model comparisonin Table 4. 


Table 4. Comparison of stacking ensemble research results 


Method Accuracy Precision Recall 
GB, RF, and VC 0.97 1.00 1.00 
LR, SVM, KNN, RF, and XGBoost 0.91 0.90 0.90 


4. CONCLUSION AND FUTURE WORK 

In this study, we successfully explored and implemented a stacking ensemble approach involving 
base models, such as GBC and RFC as well as the main model VC. The use of SMOTE techniques to address 
data imbalance also proved effective in improving the performance of our classification models. 
Experimental results show significant improvements in evaluation metrics such as ROC AUC and G-mean, 
validating the effectiveness of this approach in handling classification cases on imbalanced datasets. While 
this study provides a better understanding of the application of stacking ensemble and SMOTE, there are still 
various areas for further research. First, further exploration and testing of various base models and parameter 
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settings may provide additional insights into the best combination for specific datasets. In addition, it could 
be considered to integrate more advanced clustering techniques or more complex models in the stacking 
ensemble. Furthermore, in-depth research on feature engineering and feature selection can improve the 
model’s ability to recognize patterns on more complex datasets. In addition, exploration of more advanced 
evaluation methods and development of model interpretation techniques could be an interesting research 
direction. Thus, future research can bring further contributions in dealing with classification challenges on 
imbalanced datasets. 
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